
"Since 1972, the joint NASA/ U.S. Geological Survey Landsat series of Earth Observation satellites have continuously acquired images of the Earth’s land surface, providing uninterrupted data to help land managers and policymakers make informed decisions about natural resources and the environment." https://www.usgs.gov/landsat-missions
Landsat-5, 7, 8 and 9 collection 2 products are managed by USGS. USGS make Landsat data available via number of services, including:
Surface Reflectance, Surface Temperature and Level-1 (top of atmosphere) products for each of Landsat-5, 7, 8 and 9 are available.
EASI Asia ODC product names (Explorer): | Name | Product | Information |--|--|--| | USGS Landsat surface reflectance | landsat5_c2l2_sr | Landsat 5 Collection 2 Level-2 Surface Reflectance Product. 30m UTM based projection | | | landsat7_c2l2_sr | Landsat 7 USGS Collection 2 Surface Reflectance, processed using LEDAPS. 30m UTM based projection | | | landsat8_c2l2_sr | Landsat 8 Collection 2 Surface Reflectance, processed using LaSRC. 30m UTM based projection | | | landsat9_c2l2_sr | Landsat 9 Collection 2 Surface Reflectance, processed using LaSRC. 30m UTM based projection | | USGS Landsat surface temperature | landsat5_c2l2_st | Landsat 5 Collection 2 Level-2 UTM Surface Temperature (ST) Product | | | landsat7_c2l2_st | Landsat 7 Collection 2 Level-2 UTM Surface Temperature (ST) Product | | | landsat8_c2l2_st | Landsat 8 Collection 2 Level-2 UTM Surface Temperature (ST) Product | | | landsat9_c2l2_st | Landsat 9 Collection 2 Level-2 UTM Surface Temperature (ST) Product | | USGS Landsat Level-1 (TOA) | landsat8_c2l1 | Landsat 8 Collection 2 Level-1 (top of atmosphere) | | | landsat9_c2l1 | Landsat 9 Collection 2 Level-1 (top of atmosphere) |
Landsat products are read from the AWS STAC API. The data are in COG format. Two methods are shown in this notebook, each returns an essentially equivalent xarray Dataset:
datacube database, which has a "cached" copy of the scenes and metadata (uses odc-tools)Notes for using the AWS STAC API:
requester_pays = Trueus-west-2 (consider egress and latency)caching-proxy settings (to help reduce egress and latency costs)%matplotlib inline
# Data tools
import numpy as np
import xarray as xr
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
# Formatting options
pd.set_option("display.max_rows", None)
# plt.rcParams['figure.figsize'] = [12, 8]
# Datacube
import datacube
from datacube.utils import masking # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/masking.py
from odc.algo import enum_to_bool # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
from datacube.utils.aws import configure_s3_access
# Notebook helper tools (in dea_tools or in this repo)
import sys
from os import environ
repo = f'{environ["HOME"]}/eocsi-hackathon-2022' # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from tools.notebook_utils import xarray_object_size
try:
from dea_tools.plotting import display_map, rgb
from dea_tools.datahandling import mostcommon_crs
except ImportError:
# Local copy of selected dea_tools
from tools.datacube_utils import display_map, mostcommon_crs
rgb = None # Not copied or adapted yet
# Holoviews, Datashader and Bokeh
import hvplot.pandas
import hvplot.xarray
import holoviews as hv
import panel as pn
import colorcet as cc
import cartopy.crs as ccrs
from datashader import reductions
from holoviews import opts
# import geoviews as gv
# from holoviews.operation.datashader import rasterize
hv.extension('bokeh', logo=False);
# Initialise a datacube connection
dc = datacube.Datacube()
# Access AWS "requester-pays" buckets
configure_s3_access(aws_unsigned=False, requester_pays=True)
# Use EASI caching-proxy (applies to selected source buckets)
environ["AWS_HTTPS"] = "NO"
environ["GDAL_HTTP_PROXY"] = "easi-caching-proxy.caching-proxy:80"
print(f'Will use caching proxy at: {environ.get("GDAL_HTTP_PROXY")}')
Will use caching proxy at: easi-caching-proxy.caching-proxy:80
# Vietnam - Ha Long
latitude = (20.7, 21.1)
longitude = (106.7, 107.2)
time=('2020-02-01', '2020-04-01')
display_map(longitude, latitude)
# Select a product
product = 'landsat8_c2l2_sr'
# Split the query to determine the most common CRS (essentially call find_datasets())
query = {
'x': longitude, # "x" axis bounds
'y': latitude, # "y" axis bounds
'time': time, # Any parsable date strings
}
# Most common CRS
# dea_tools: ValueError if query does not return datasets
native_crs = mostcommon_crs(dc, product, query)
print(f'Native CRS: {native_crs}')
query.update({
'product': product, # Product name
'output_crs': native_crs, # EPSG code
'resolution': (30, 30), # Target resolution
'group_by': 'solar_day', # Scene ordering
# 'dask_chunks': {'x': 2048, 'y': 2048}, # Dask chunks
})
# Load data
data = dc.load(**query)
display(xarray_object_size(data))
display(data)
Native CRS: EPSG:32648
'Dataset size: 47.61 MB'
<xarray.Dataset>
Dimensions: (time: 1, y: 1498, x: 1753)
Coordinates:
* time (time) datetime64[ns] 2020-02-14T03:17:16.453607
* y (y) float64 2.29e+06 2.29e+06 2.29e+06 ... 2.335e+06 2.335e+06
* x (x) float64 6.766e+05 6.766e+05 ... 7.291e+05 7.291e+05
spatial_ref int32 32648
Data variables:
coastal (time, y, x) uint16 0 0 0 0 0 ... 43982 44124 44485 44905 45366
blue (time, y, x) uint16 0 0 0 0 0 ... 43958 44162 44487 44933 45458
green (time, y, x) uint16 0 0 0 0 0 ... 42835 42980 43240 43892 44440
red (time, y, x) uint16 0 0 0 0 0 ... 42556 42741 43012 43660 44289
nir08 (time, y, x) uint16 0 0 0 0 0 ... 41679 41879 42211 42768 43458
swir16 (time, y, x) uint16 0 0 0 0 0 ... 29493 29956 30389 32847 33978
swir22 (time, y, x) uint16 0 0 0 0 0 ... 24751 25260 25750 28564 29340
qa_pixel (time, y, x) uint16 1 1 1 1 1 ... 22280 22280 22280 22280 22280
qa_aerosol (time, y, x) uint8 1 1 1 1 1 1 1 ... 224 192 224 224 192 224
qa_radsat (time, y, x) uint16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Attributes:
crs: EPSG:32648
grid_mapping: spatial_ref# Optional. Filter Datasets prior to Load
# For example, to load only Tier 1 (best quality) datasets and exclude Tier 2 (good quality).
# See https://www.usgs.gov/core-science-systems/nli/landsat/landsat-collection-1 for a description of Landsat processing Tiers.
# 1. Remove load parameters from query object
# Borrowed from https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/datahandling.py
# non_load_query = datacube_utils.dc_query_only(**query)
# dataset_list = dc.find_datasets(**non_load_query)
# 2. Check your query has results
# Borrowed from https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/datahandling.py
# if len(dataset_list) == 0:
# print("No data available for query: ensure that "
# "the products specified have data for the "
# "time and location requested")
# 3. Check what details are available in each dataset
# See https://github.com/opendatacube/datacube-core/blob/develop/datacube/model/__init__.py, class Dataset
# display(datasets[0].__dict__)
# 4. Filter based on property of interest
# For Landsat, the Tier label is available in the 'landsat:collection_category' property
# dataset_list = [ds for ds in dataset_list if ds.metadata_doc['properties']['landsat:collection_category'] == 'T1']
# if len(dataset_list) == 0:
# print("No data available after filtering")
# 5. Update query object for next cell
# 'datasets' will used instead of the standard database lookup
# query['datasets'] = dataset_list
# query
# Load data
data = dc.load(**query)
notebook_utils.heading(notebook_utils.xarray_object_size(data))
display(data)
# Calculate valid (not nodata) masks for each layer
valid_mask = masking.valid_data_mask(data)
notebook_utils.heading('Valid data masks for each variable')
display(valid_mask)
Display the measurement definitions for the selected product.
Use list_measurements to show the details for a product, and masking.describe_variable_flags to show the flag definitions.
# Measurement definitions for the selected product
measurement_info = dc.list_measurements().loc[query['product']]
notebook_utils.heading(f'Measurement table for product: {query["product"]}')
notebook_utils.display_table(measurement_info) # Default pandas table display. Some rows or columns may be abbreviated
# Separate lists of measurement names and flag names
measurement_names = measurement_info[ pd.isnull(measurement_info.flags_definition)].index
flag_names = measurement_info[pd.notnull(measurement_info.flags_definition)].index
notebook_utils.heading('Selected Measurement and Flag names')
notebook_utils.display_table(pd.DataFrame({
'group': ['Measurement names', 'Flag names'],
'names': [', '.join(measurement_names), ', '.join(flag_names)]
}))
# Flag definitions
for flag in flag_names:
notebook_utils.heading(f'Flag definition table for flag name: {flag}')
notebook_utils.display_table(masking.describe_variable_flags(data[flag]))
# Make L2_FLAGS image
flag_name = 'pixel_qa'
flag_data = data[[flag_name]].where(valid_mask[flag_name]).persist() # Dataset
display(flag_data)
##
## Drat!
## Geoviews/Cartopy projection from epsg:32649 to PlateCarree doesn't work (works for S2 UTM)
##
# These options manipulate the color map and colorbar to show the categories for this product
options = {
'title': f'Flag data for: {query["product"]} ({flag_name})',
'cmap': cc.rainbow,
'colorbar': True,
'width': 700,
'height': 450,
'aspect': 'equal',
'tools': ['hover'],
}
# Set the Dataset CRS
plot_crs = native_crs
if plot_crs == 'epsg:4326':
plot_crs = ccrs.PlateCarree()
# Native data and coastline overlay:
# - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used
quality_plot = flag_data.hvplot.image(
x = 'x', y = 'y', # Dataset x,y dimension names
rasterize = True, # Use Datashader
aggregator = reductions.mode(), # Datashader selects mode value, requires 'hv.Image'
precompute = True, # Datashader precomputes what it can
crs = plot_crs, # Dataset CRS
projection = ccrs.PlateCarree(), # Output Projection (ccrs.PlateCarree() when coastline=True)
coastline = '10m', # Coastline = '10m'/'50m'/'110m'
).options(opts.Image(**options)).hist()
# display(quality_plot)
# Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
fig = pn.panel(quality_plot, widgets={'time': pn.widgets.Select}) # widget_location='top_left'
display(fig)
# Create mask layer
# "L3 Mask Default"
good_pixel_flags = {
'snow': 'no_snow', # 'no_snow', 'snow'
# 'clear': 'clear_land', # 'no_clear_land', 'clear_land'
'cloud': 'no_cloud', # 'no_cloud', 'cloud'
# 'water': 'water', # 'no_water', 'water'
'nodata': False, # False, True
'cloud_shadow': 'no_cloud_shadow', # 'no_cloud_shadow', 'cloud_shadow'
# 'cloud_confidence': 'medium', # 'none', 'low', 'medium', 'high'
# 'cirrus_confidence': 'medium', # 'none', 'low', 'medium', 'high'
# 'terrain_occlusion': 'no_occlusion', # 'no_occlusion', 'occlusion'
}
good_pixel_mask = masking.make_mask(data[flag_name], **good_pixel_flags) # -> bool array
display(good_pixel_mask) # Type: bool
# Define scaling
# usgs_espa_ls8c1_sr, usgs_espa_ls8c1_ar
scale = 0.0001
offset = 0.
# Select a layer and apply masking and scaling, then persist in dask
layer_name = 'nir'
# Apply valid mask and good pixel mask
layer = data[[layer_name]].where(valid_mask[layer_name] & good_pixel_mask) * scale + offset
layer = layer.persist()
##
## Drat!
## Geoviews/Cartopy projection from epsg:32649 to PlateCarree doesn't work (works for S2 UTM)
##
# Generate a plot
options = {
'title': f'{query["product"]}: {layer_name}',
'width': 1000,
'height': 450,
'aspect': 'equal',
'cmap': cc.rainbow,
'clim': (0, 0.05), # Limit the color range depending on the layer_name
'colorbar': True,
'tools': ['hover'],
}
# Set the Dataset CRS
plot_crs = native_crs
if plot_crs == 'epsg:4326':
plot_crs = ccrs.PlateCarree()
# Native data and coastline overlay:
# - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used
layer_plot = layer.hvplot.image(
x = 'x', y = 'y', # Dataset x,y dimension names
rasterize = True, # Use Datashader
aggregator = reductions.mean(), # Datashader selects mean value
precompute = True, # Datashader precomputes what it can
crs = plot_crs, # Dataset crs
projection = ccrs.PlateCarree(), # Output projection (use ccrs.PlateCarree() when coastline=True)
coastline='10m', # Coastline = '10m'/'50m'/'110m'
).options(opts.Image(**options)).hist(bin_range = options['clim'])
# display(layer_plot)
# Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
fig = pn.panel(layer_plot, widgets={'time': pn.widgets.Select}) # widget_location='top_left'
display(fig)
# Good image of Sarawak: 2020-01-27 0555
Reference material
# Filter datasets
# From load_ard()
datasets = dc.find_datasets(product=product, **query)
# Remove datasets after the Landsat 7 SLC failure, May 31 2003.
if product in ('ga_ls7e_ard_3', 'usgs_espa_ls7c1_sr'):
datasets = [i for i in datasets if
normalise_dt(i.time.begin) <
datetime.datetime(2003, 5, 31)]
# Masking and Scaling
# dea-notebooks/Real_world_examples/ARD_Intercomparison/utilities/util.py
ls8_USGS_cloud_pixel_qa_value = [324, 352, 368, 386, 388, 392, 400, 416,
432, 480, 864, 880, 898, 900, 904, 928,
944, 992, 1350]
non_ls8_USGS_cloud_pixel_qa_value = [72, 96, 112, 130, 132, 136, 144, 160,
176, 224]
non_ls8_USGS_sr_cloud_qa_value = [2, 4, 12, 20, 34, 36, 52]
mask_data = data[mask_band]
nodata_value = mask_data.nodata
nodata_cloud_value = []
if 'usgs' in source_prod:
if 'ls8' in source_prod:
nodata_cloud_value = ls8_USGS_cloud_pixel_qa_value
else:
if mask_band == 'sr_cloud_qa':
nodata_cloud_value = non_ls8_USGS_sr_cloud_qa_value
else:
nodata_cloud_value = non_ls8_USGS_cloud_pixel_qa_value
nodata_cloud_value.append(nodata_value)
nodata_cloud = np.isin(mask_data, nodata_cloud_value)
cld_free = data.where(~nodata_cloud).dropna(dim='time', how='all')
# remove nodata for the pixel of interest
cld_free_valid = masking.mask_invalid_data(cld_free)
# Visualisation